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A Novel Sparsity-Based Approach to Recursive 
Estimation of Dynamic Parameter Sets 
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Abstract 

We consider the problem of estimating a variable number of parameters with a dynamic nature. A 
familiar example is finding the position of moving targets using sensor array observations. The problem 
is challenging in cases where either the observations are not reliable or the parameters evolve rapidly. 
Inspired by the sparsity based techniques, we introduce a novel Bayesian model for the problems of 
interest and study its associated recursive Bayesian filter. We propose an algorithm approximating the 
Bayesian filter, maintaining a reasonable amount of calculations. We compare by numerical evaluation 
the resulting technique to state-of-the-art algorithms in different scenarios. In a scenario with a low SNR, 
the proposed method outperforms other complex techniques. 

Index Terms 

Recursive Bayesian filter. Target tracking. Sparse estimation. Compressed sensing 

I. Introduction 

Estimating a dynamic set of parameters is a highly useful and wide area of research, with a long 
and fruitful history HI. Indeed, noticing the ever increasing application of the Kalman filter and its 
variants to many newly developed technologies is enough to understand the importance of this topic. 
In this context, the quest for modified techniques usually concerns cases where either the currently 
existing methods fail to meet the computational limitations, or result in an insufficient precision. The 
latter may also be either due to an inconsistent model, on which the technique is based, or simply 
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because of improper approximations. From this perspective, one finds certain estimation problems, for 
example the ones concerning data generated by a sensor array, more challenging. The reason is that the 
associated models, being capable of capturing the desired properties of the parameters, are so complicated 
that standard design methods by them lead to computationally intractable techniques. Thus, appealing 
to proper approximations is inevitable in those cases. This paper addresses these problems and aims 
to provide a modified approximafe esfimafion fechnique. The emphasis here is on mainfaining a low 
computational complexity, while maintaining the statistical properties of the estimates. 

The central idea in estimating a time varying parameter is that a parameter following a well-structured 
temporal model has locally correlated samples. Thus, they can be fused to improve the quality of 
estimation for a specific sample. This is particularly known as parameter filtering ||2l. The basic ideas 
of filtering can be easily observed in the pioneering studies of Wiener, initiating the field of adapfive 
fillering ||3l. Lafer, fhe seminal work of Kalman framed adaptive filtering into a rigorous statistical context, 
and showed a case, where statistically efficient estimates could be exactly calculated by a recursive 
method ||4l. Soon after Kalman, Ho and Lee generalized this idea to the so called Markov Chain (MC) 
models, comprising of parameter evolution and measurement models Q. Their solution is generally 
called Recursive Bayesian Filtering (RBF). The main advantage of the RBF is that it is highly adaptive 
to different application specifications, including a non-stationary behavior ||6l. However, it requires storing 
and integrating posterior densities. Approximate techniques such as the Extended Kalman Filter (EKF) 
Q, im and Unscented Kalman Eilter (UKE) |[9l are commonly used to perform this. Due to their local 
nature, they perform poorly, when multi-modal distributions are considered. The advent of statistical 
sampling and Monte Carlo methods provided an alternative method of implementing recursive Bayesian 
filters, by the so called Markov Chain Monte Carlo (MCMC) method. The resulting filter is generally 
known as the particle filter ifTOl - lfT^ . 

The difficulty arises in applying the above to problems such as radar detection, where the data is 
generated by a sensor array. This is due to multiple reasons, discussed in the sequel. The first reason is 
that a MC model is not directly applicable. To elaborate on this, note that the corresponding measurement 
model for data generated by a sensor array consists of two distinct set of parameters, known as amplitude 
and position parameters. In many applications, the amplitudes evolve rapidly in time, resulting in highly 
uncorrelated samples. Thus, only in the sense of position parameters one may perceive a Bayesian filter. 
To remain in the realm of RBE, it is still necessary to handle the amplitudes in a Bayesian manner. The 
second reason is that the observation model of the applications of interest is nonlinear, and estimation 
through them usually leads to the local minima problem. In the same manner, nonlinearity results in 
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posterior multimodality, which not only complicates estimation, hut also makes the posterior calculation 
difficult. The third reason is related to the fact that the time evolution model of the position parameters 
concerns varying order. Take the radar example. In the course of ohservation, it is perceivable that some 
targets may he introduced or removed from the ohservation scene. In a more elaborate model, a single 
target may spawn multiple future targets. A MC model capturing the dynamics of such a system is 
complex and its corresponding sequential Bayesian filter can only be derived in an abstract form. To 
reduce the computational cost without introducing too much error, this filter needs to be approximately 
parametrized. This is generally a challenging task. 

A. Literature Survey 

Due to the above, one may find different approaches in the literature to recursive filtering of the 
sensor array data. According to different representations of the problem of interest, these methods are 
developed under different names. More specifically, the parametric (Kalman filter-based), spectral-based 
and subspace-based representations give rise to filtering techniques under similar titles. Some spectral 
based techniques can be found, e.g in lIT^ - ITSl . The subspace tracking approaches have also been recently 
studied and applied in the literature lIT^ - llTSl . The semi-parametric sparsity-based techniques are also 
rapidly emerging in literature under the title of sparsity tracking |[T9l - ll2n . The filtering techniques can be 
also categorized from a different perspective. Many studies consider a case where preliminary parameter 
estimates are provided, relying only on their corresponding data. This is called target tracking and is 
favorable in occasions, such as some radar detection problems, where only the preliminary estimates are 
accessible for process |[2^ - |[2^ . In contrast to target tracking, the recent attempts to directly use sensor 
data to perform parameter filtering is often referred to as Track-Before-Detect (TBD), but this is not a 
generic term ||25| . Il26l . 

The above techniques deal with the aforementioned difficulties in different ways. The target tracking 
and the subspace tracking techniques do not suffer from lack of amplitude models, while other TBD 
approaches either assume a specific amplitude model, depending on the application or eliminate them by 
assuming a Bayesian model and integration |[27l . The amplitude models usually involve hyper-parameters, 
for which simple time evolution models are considered. The parametric formulation is the most precise 
likelihood based approach ifTTl . but is numerically sensitive to nonlinearity. The Joint Probabilistic Data 
Association (JPDA) and Probabilistic MultiHypothesis Tracker PMHT Il28l methods are popular examples 
of parametric target tracking ||2^ . Instead, the methods leading to spectral estimation such as 

subspace-based and sparsity-based techniques trade off precision in favor of numerical stability. Moreover, 
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particle filtering is nowadays a common approach to overcome multi-modality ISTl . Concerning the issue 
with variable order, many related studies consider a fairly general model, where the parameters have a 
fixed prohahility to survive, disappear or appear at the next time instant. In the recent literature, this is 
formulated as a Random Finite Set (RFS) model, also considered here and referred to as the standard 
model l32l . The RFS based representation not only provides a formal definition of the time evolution 
model, but also suggests certain approximation techniques. For example the Probability Hypothesis 
Density (PHD) filter provides a method to overcome the so-called data association problem in target 
tracking through approximating the RFS-based posteriors by a Poisson process |[3Tl . The data association 
problem is due to the fact that the preliminary estimates are not generally labeled by their corresponding 
true parameter. More elaborate examples of such can be found in 1341 . 

B. Motivation 

In the problems of interest herein, the RBF approach needs to be approximated and the performance of 
all the techniques in the prequel is limited by the precision of their underlying approximation. From this 
perspective, these techniques can be divided into three groups: The locality based approaches such as EKF 
and UKF, the ones based on stochastic sampling, i.e. particle filters, and other model-based approximations 
such as the ones in the PHD filtering. The latter is normally based on minimizing the Kullback-Leibler 
(KL) distance between the resulting posteriors and a parametrized model set, which is applicable only 
if the minimization has a tractable solution. Clearly, the choice of approximation depends on the type 
of filter. For example, a locality based approximation is not appropriate for parametric filtering, where 
multiple local minima are present. In general, particle filters are always applicable, but need a higher 
computational effort (number of particles) than the other techniques to provide the desired precision. The 
precision of the methods such as the PHD filter depends on how well the approximate model fits to 
the exact one. Practically speaking, this restricts such methods to a high SNR or a slowly varying case. 
Moreover, the target tracking performance is also dependent on the quality of the preliminary estimates, 
which considerably limits the SNR range of application for these techniques. 

In this paper, we study a different opportunity provided by the findings in the field of sparsify-based 
estimation, especially the Least Absolute Shrinkage and Selection Operator (LASSO) |[3^ - |[38l . Recently, 
the inspiring work of Stoica et al in Il38l . If39l has provided an important Bayesian insight into this 
approach, which we slightly modify here to fit the RFS framework. Using this model for observation and 
considering the standard RFS based time evolution model, we investigate on the resulting RBF. The RBF is 
again intractable and needs approximation. On the other hand, it is observed that the convexity of LASSO 
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yields to unimodality of the posterior distributions. Thus, it is favorable to use local approximations, 
similar to EKF. We develop a local expansion technique performed on the abstract space of finite sets 
and apply it to the proposed RFS, leading to a tractable filter. 

C. Mathematical Notation 

In this paper, M, M+ and C refer to the set of real, non-negative real and complex numbers, respectively. 
The notation Tr(.) denotes the trace operator and |.| shows either the absolute (in the case of a numerical 
argument), or the cardinality (in the case of a set argument) of the argument. Moreover, (.)+ denotes 
the positive part of its real argument. We also define an assignment R between finite sets A and B as a 
subset of A X B satisfying the following conditions 

• V(ai, bi) € R, ( 02 , 62 ) ^ R', ai = 02 —> 61 = 62 

• V(ai, bi) € R, ( 02 , 62 ) R', 61 = 62 oj = 0,2 

Moreover, we define the domain sets of R as the elements in A and B, included in R, i.e. 

• di{R) = {a £ A \ 3b £ B, {a, b) £ R} 

• d 2 {R) = {b £ B \ 3a £ A, {a, b) £ R} 

Throughout the paper, + and — subscripts or superscripts denote parameter values right after and before 
an observation, respectively. The primed parameters are usually related to the ones at a previous time 
instant. The notation p{.) denotes the probability density function (pdf) of its argument and Q (.,.) 
represents the transitional probability between consecutive samples. 

II. Problem Formulation 

A. Observation Model 

Consider a compact subset 0 C M and a smooth basis manifold a : 0 —>■ Further, consider a 

vector data set {x(f) £ observed through the following model: 

rit 

x(f) = ^a( 6 'fc(f))sfe(f) + n(f) ( 1 ) 

fc=i 

where t is the time index, the sets {9k{t) £ 0} and {sk{t) £ C} are called position and amplitude 
parameters, respectively and {n(f)} denotes the additive noise, assumed to be a centered Gaussian, white 
and stationary process, with covariance matrix a‘^1 . Notice that nt, the number of parameters involved 
in modeling x(f), also known as order, can be variable in time and is seldom a priorly known. The 
aim is mainly to estimate n* and the position parameters ({ 0 fc(f)}), since they often carry the desired 
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information. However, since the model in ([T]) is linear in the amplitude parameters, once the position 
parameters are replaced hy their estimates, a standard linear estimator such as Ordinary Least Squares 
(OLS) may he used to estimate the amplitudes. The problem of estimating rit is often called model order 
selection. 

More formally, the observation model in ([Til can be written as 

I St) = (2) 

where the finite set St, given by 


St = {(6»1 (f), Si (f)), (02 (f),S2(f)),... ,(6'n,(t),Sn,(f))}. 


( 3 ) 


represents the state. We further assume that the amplitudes Sk{t) are distributed by a centered Gaussian 
pdf with variance Xfc(f), which we refer to as intensity. This can be formally written as 


1 

p{sk I Ikit)) = ~ 


( 4 ) 


TTlkit) 

After straightforward manipulations, combining ([2]) and (ID), and integrating over Sk leads to the following 
likelihood function in terms of the position and intensity parameters. 


p(x(f) I St) = 


1 


=-x"(t)R ^t)x{t) 


where 


vr™ det(R(f)) 

St = {{ei{t),iiit)), ( 02 (f),X 2 (f)),..., {enAt)Mt))} 


( 5 ) 


( 6 ) 


is a new state representation, here called the hyper-state, and 


R(f) = R{St) = <7^1 + J]Xfc(f)a(0fc(f))a^(0fc(f)) 


( 7 ) 


k=l 


The recent findings in fhe field of sparsity-based estimation suggest to substitute the determinant 
term with an exponential function to obtain 

—x^(t)R~^(t)x(t)—Xo A 


p(x(t) I 5t) oc e 


( 8 ) 


where Aq is related to the average number of parameters, and practically treated as a design parameter. 
Considered in this work, the model in ([8]l leads to a convex ML estimator, known as S Parse Iterative 
Covariance-based Estimation (SPICE). Moreover, the convexity of the negative log-likelihood leads to 
unimodality of the posterior distributions. 
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B. Time Evolution Model 

For the applications of interest herein, it is not suitable to consider an evolution model for the state 
St- Instead, a motion model for the hyperstate St is considered. The motion model is a Markov chain, 
represented hy the transitional prohahility density Q{S,S') = p{St+i = S \ St = S'). It assigns to any 
pair of finite sets {S, S') a value, quantifying the likelihood of S' being followed by S. Note that we 
consider a temporally constant transition function Q, corresponding to a stationary Markov Chain (MC). 
Then, the joint p.d.f. of the sequence of state sets over an arbitrary window {fi, fi + 1,..., 12 } of time 
is given by 

p{St,,St,+i,St,+2 ... = 

Pt,iSt,)Q{St,+i, StMSt,+2, St,+i)... Q{St,,St,-i) (9) 

where pt^{Sti) denotes the marginal state distribution at the initial time ti- We focus on a specific 
transition probability, associated with a case, where the elements of St may first independently disappear 
with a small probability a. Then, the surviving elements may be modified by scalar models po(^t+i = ^ I 
6t = 9') , pi{It+i = I = I'), and finally some new independent elements may be added according 
to a Poisson process with the hypothesis density function 6(9,1). This means that a new parameter may 
independently appear in a small neighborhood AA of a point (9,1) with probability d(9,l)d(Af), where 
d(AA) is the volume (Lebesgue measure) of Af. Note that 

j 6(9,l)d9dl < 00 . (10) 

0xR+ 

is the average rate of parameter birth, here assumed to be small. Then, the transition probability Q(S, S') 
is given by 

Q(St+i = S,St = S') = e-^ E 

ReT{S,S') 

alS'l-|R|(i _a)l«l n Po(9 \ 9')pi(I \ I') n (11) 

{e,X,9',X')€R 8^di{R) 

where each summand is defined by an assignment R between the elements of S and the elements of 
S'. Note that \S'\ — |i2| is the number of removed parameters from S', and the set 9 ^ di(R) contains 
the newly introduced parameters in S. Hence, the three product terms in the summand evaluate the 
probabilities of survival, alteration and birth, respectively and according to the assignment R. The question 
of interest herein is to provide a filter, estimating the set St at each time t based on the observations 
x(l), x(2),... ,x(f), the observation model in ([8]) and the MC motion model given by the transition 
probability in (fTTl) . 
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III. Recursive Bayesian Filtering 


The model in (|9l) enables us to solve exactly the desired estimation problem in a recursive way. 
Denoting X^^'> = [x(l),x(2),... ,x(t)], we observe that the best estimate, in the Maximum A Posterior 
(MAP) density, for St based on the observations up to time t, i.e. is given by maximizing the 

conditional likelihood p{St \ The special form of the MC model in (|9l) allows to recursively 

calculate p{St \ X^^'>) by applying the Bayes rule: 

p(x(t) I St)p{St I 

/p(x(f) I St = S)piSt = S I X(‘-i))dS 
<s 

where S denotes the entire space of the hyper-states, discussed in Appendix and 


p{St I = j Q{St, St-i = S)p{St-i = S I X(*-I))d5 (13) 

5 

The resulting recursion is simple: Given the conditional distribution p{St-i \ at time instant t—1, 

calculate the prediction distribution p{St \ by (fT3l ). Then, use (fT^ to update the conditional 

distribution to p{St \ X^^'^). As seen, the denominator in (fT^ is independent of St- Thus, it can be 
replaced by any other scaling factor, without affecting the final result of MAP estimation, simplifying 
the calculations. This is called recursive Bayesian filtering. 

The difficulty in the above method is to store the conditional distribution and calculate the integral in 
(fT3l) . Our method here is to consider the following family of approximate distributions, parametrized by an 
arbitrary symmetric positive semidefinite matrix R and an arbitrary positive weight function A : 0 —> M+ 
as follows 

p(5; R,A) =exp-Tr(RR-^) - ^ X{9)I (14) 

{e,x)es 

where 

R = Ti{S) = £7^1 + Xa(0)a^(0) (15) 

ie,x}es 

We approximate the posteriors by selecting the closest distribution in the KL sense in this family. We 
denote the parameters of the closest distribution to p{St \ and p{St \ by (R^,A)") and 

(R^,A^), respectively. 

The distribution in (1141) is necessarily unimodal. Moreover, when R and A are large, it is highly 
concentrated around its global maximal point, called the Maximum A Posterior (MAP) hyper-state 
estimate. When the updated distribution p{St \ X^*)) is considered, the resulting MAP estimate is the 
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filter output (the desired estimate). When, p{St \ is instead considered, the MAP estimate is 

called the predicted hyper state. 


A. Calculating the MAP Hyper-State Estimate 

One of the advantages with the above choice of approximate distribution is that it simplifies calculating 
the maximum a posterior estimate. When the posterior p{St \ is calculated and approximated by 

parameters (R^,A^), the hyper-state MAP estimate is calculated by 

4 = arginaxp(5t I R+, (16) 

SeS 

Similarly, the MAP predicted hyper-state is defined by 

<5r = arginaxp(5t | R^,AJ“) (17) 

SeS 

Both optimizations in (fTTl) and (fT^ yield to 


Tr 



St = arginin 
S£S 


-1 


E Xa(0)a^(0) 
(0,x)eS , 




+ E 

(9,i)eS 


(18) 


where the plus and negative sign is for (fT^ and (fTTl) . respectively. The optimization in (fT^ is a type 
of sparsity-based estimator and can be solved fast and precisely, with the so called weighted SPICE 
technique. First, a fine grid {0i, 02,, 9n} over 0 is considered. Then, the following convex optimization 
is solved and the non-zero elements are selected as the estimates. 


min 

N 

+ E ik^tiOk) 

k=l 

(19) 

The optimization in (fT^ can be solved either by the off-the-shelf techniques, such as the CVX toolbox, 
or by the specific technique explained in ll^ . 


Tr 


N 


a' 


'1+ E2:fea(0fc)a^(0fc) ) R; 


k=l 


-1 
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B. Update Step 

Assume that at a certain time instant t, the posterior p{St \ is approximated hy parameters 

(R^, A^). Once the vector x(t) is observed, the posterior is changed according to (IT^ . which using ([8]l, 
results in 


We obtain that 


p{St I xW) oc 

-x«‘(t)R-i(I)x(I)-Tr(R-R-ip))- E (A^ (6»)X+AoI) 

(fl,z)est 


exp 


-Tr 


R-i(t) 


Rj + x(i)x'^ 


- E (Ar(0) + Ao)X 

{0,x)eSt 


R^ = Rj +x(t)x'^(t) 
A+( 0 ) = A,-( 0 ) + Ao 


( 20 ) 


( 21 ) 


C. Prediction Step Approximation 

Now, consider occasions where the posterior p{St+i \ X(0) is to be calculated by (fT3] ). Assume that 
the posterior p{St \ X^)) is approximated by the parameters R^ and A^ and that these parameters 
are large enough, such that the corresponding posterior is highly concentrated around the filter output 
St- In this case and according to Appendix |Bj is a result of perturbing the parameters of the MAP 
hyper-state estimate with a Gaussian perturbation, followed by adding extra elements (0{,X^), distributed 
by a Poisson distribution. For simplicity, let us denote St = {(0i,Xi),..., {9n,In)} and denote by A6k 
and AXfc the perturbations in 6^ and respectively. Then, according to the extended Laplace’s method, 
derived in Appendix IB-Bl we may approximate p{St \ : 

~ AA(0, G^^), AXk ~ A((0, 

{{elxl)} ~ Poisson(w(0,X)) (22) 


where 


G, = 


_ a^Tr(R+R^i) 


Tj a^'TrfR+R-i) 
= ^ 


■j\ _ I a(0)^R_^At)R+R_i_Ut)a(0)I _ \ + 


(23) 
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Simple calculations show that after applying time evolution hy (fTSl) . the approximation in (1^ still 
holds, hut the parameters Gk,Hk and uj{9,T) are updated (See ll^ for the Poisson Process under time 
evolution) to 




Gi 


Gk TSft _ Hk 

1+a'iGk ’ — l+a^Hk 


(1 - a) / u{9',T)po{9 I 9')pi{I \ T)d9dl + 

0xK+ 

5{9,I) 


(24) 


respectively, where and fjj are the perturbation variance, associated with the time evolution models 
Po and pi, given hy Var(0t | 9t = 9k) and Var(Zj | It = Ik) respectively. This represents the posterior 
distribution after time evolution. Now, we project this distribution on the desired space of parametrized 
distributions by R and A. We perform this by taking the minimum KL distance. Although the process 
is generally intractable, assuming that time evolution is small, i.e. the hyper-state does not change fast, 
the process can be easily performed by perturbation theory. Appendix |Cj establishes this relation. Here, 
we consider the final result, where limited computational complexity is also considered. The simplified 
prediction steps can be represented by 


^t+i ~ 


-+E' 


2n 


-R7 


Aj;i(0) = [A+(0)- 

^(A(0) - a^(0)R;i(f)R+R;i(f)a(0)) 


where Si{9) = f Id(9,I)dI. The overall proposed algorithm is summarized in Algorithm [T] 

R+ 


(25) 


IV. Numerical Results and Comparison to Related Works 

In this section, we examine the method developed in Section |III] in a number of selected scenarios 
and compare the results on the synthetic data to other filtering technique. We consider the problem of 
Direction of Arrival (DOA) estimation with a Uniform Linear Array (ULA), where the position parameter 
is the direction (angle) of an electromagnetic source and the amplitude is the complex envelope of the 
electromagnetic wave transmitted by it and the observation vector is the signal measured at a ULA of 
m = 20 sensors. The DOA is often reparametrized for simplicity, introducing the electrical angle, which 
we utilize here. Then, ([T]) holds with 

a(9) = [1 (26) 


where 6* € 0 = [—tt vr] is the electrical angle. 
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Algorithm 1 The proposed algorithm. 

Require: A positive definite matrix C and a positive function (ii(6l). 

Initialize hy a symmetric positive definite matrix and a positive function A^(0). 

Set t = 1. 

repeat 

Observe x(f) and calculate and from (|2TI) . 

Calculate St by solving its corresponding SPICE optimization in ([T9l ) and selecting nonzero 
elements. Calculate R+(f) = R(5i). 

Calculate R^^ and A^j^ from (|25]) . 

Set t f -f 1- 
until Required. 


In all simulations, we use a Gaussian MC model for parameter evolution, i.e. 

1 


and 


Po{9 I 0') = 


Pi{X\X') = 


( 0 - 0')2 


27rag 


(Z X')2 


277 CJj 


(27) 


(28) 


We also perform the calculations over the spectra (e.g. A(0)) in the recursive algorithms of interest, 
by taking a uniform grid 0 over 0 with minimum separation 0.01. This results in 629 grid points. The 
average false alarm power (5i(0) is also selected uniformly over 0, i.e. 6{9) = 6. 


A. Related Studies 

In the literature, there is a number of different studied approaches, applicable to the problem of interest 
herein. We briefly review some of the more popular ones, considered her for comparison. 

1) Sliding Window Techniques: In the simplest case, a temporal window is considered, which is 
generally defined by a window function m,- for r = 0,1,.... At a given time t, the following optimization 
is solved 

(0i(f),02(t),...,0n(t))=arg min ^ min 

6^1,6^2v:C^n Sy (t) ,^2 (^ ), • • • (^ ) 

n 2 

-'t) - E ^{^k)sk{t - t) 

k=l 2 

(29) 


E 

T = 1 


Wr 
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Then, the position parameters 9k{t) of the global minimum point is the filter output. Notice that the 
summation in the cost of (l29l) is over time, hut the parameters Ok are not time dependent. The motivation 
for (l29l ) is that the error in assuming constant position parameters can he approximately modeled hy the 
increase in the noise variance with the factors {tOAi}- Optimizing (l29l ) is equivalent to solving the ML 
estimator for such an approximate model. Also, note that the order n is fixed. In practice, where the order 
is typically unknown and variable, (|2^ is solved for a variety of orders. This can be efficiently done, e.g. 
by the RELAX technique ||40l. Denoting by Vn the optimal value of (l29l ). the order and its corresponding 
solution is selected by a rule over the collection {Vn}, generally called information criterion. We consider 
a popular choice of information criterion, given by minimizing 


min Vn + kn (30) 

n 

where A: is a design parameter. The choice of k for asymptotic cases and other information criteria 
are discussed in ll4Tll . When WAt = So,At, i-e it is non zero, only when A = 0, the optimization 
in (l29l ) simplifies to the exact ML estimator based on the observation model. We refer to this as the 
’’instantaneous” estimator. 

The inner optimization in (l29l ) can be solved analytically to obtain 


{9i, 92, ...,9n) = arg max Tr 

01 , 02 ,..., 0 „ 


RjP 


A(0i,...,0„) 


(31) 


t 

where ^ — Af)x^(f — At) is the windowed sample correlation matrix, A(0i,..., 9n) = 

At=l 

[a(0i),..., a(0„)] = A and Pa( 0 i,..., 0 „) = A(A^A)~^A^ is the projection matrix into the range 
space of A, also known as the signal space. Solving (|3T]) is still difficult, but the following approximate 
technique can be used: First, the closest projection matrix P^ to Ry in the Frobenius distance is found 
as 


Pt = U„,TU^r 


(32) 


where \Jn,T is the collection of the eigenvectors related to the n largest eigenvalues of R^. Then, the 
closest bases a(0) to the range space of Py is selected by taking the local minima of the spectrum 
ut{9) = ||a(0) — Pra(0)||2. This technique is called Multiple Signal Classification (MUSIC). 

2) Target Tracking Techniques: From one perspective, the target tracking techniques are to enhance 
the quality of estimates provided by other methods, such as the instantaneous estimates. Suppose that an 
instantaneous estimator is utilized to obtain a preliminary set of estimates Zt = {9i{t), 92{t),..., 9ht{t)}- 
Then, the estimates can be related to Xt through the analysis of the instantaneous estimator, leading to a 
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conditional pdf p{Zt \ Xt). As seen, the resulting model is again RFS based. Most often, the following 
approximate relation, very similar to the evolution model in (fTTl) is considered. 

p{Zt I Xt) = e->^ Z 

ReT{Zt,Xt) 

p\R\^l-P)\XA-\R\ pi{B\e) n (33) 

id,e)eR efdi{R) 

where /3 is the probability of detection of a parameter, pi{9 \ 9) is the distribution of an estimates 
9, corresponding to the true parameter 9, and p,{9) is the hypothesis density for the false alarm (false 
detection) process, assumed to be a Poisson process. Note that 

/r = y p{9)d9 < oo (34) 

0 

is the average false alarm rate. Given (fTTl) and (l3^ . we may use ([12]) and (fTSl) to obtain a recursive 
filter, called target tracking filter. The exact result is generally numerically intractable. To maintain a 
limited amount of calculations in the course of target tracking, the method of Probability Hypothesis 
Density (PHD) 13^ approximates the resulting posterior distributions by the Poisson process, leading 
to the following steps: Denoting by and , the PHDs for the updated and predicted posteriors, 
respectively, the prediction in (fT3] ) is exactly resolved to give 


Dt{9) = aj po{9 I 9')D-_^{9')d9' + 5{9) (35) 

0 

and the closest approximation in the Kullback-Leibler sense to the result of the calculations in (IT^ is 
found to be 


Dri9) = {l-P)D+{9)+Yl 


f3pi{9 I 9)Dt{9) 


eaZt 


Pjpi{9\9)DtmO + m 

0 


(36) 


The final esfimafes are given by local maxima of D^{9). 

3) Subspace-Based Techniques: Anofher fype of recursive fillers is infroduced, based on Ihe subspace 
techniques such as the previously introduced MUSIC method. The idea is to replace Xt = {9k{t)} by 
the subspace JZ’, spanned by the bases {a(0fc(f))}. The subspace is represented by a projection matrix 
P(t). An effective way to estimate P(t), also considered here is to solve 


P(f) = argimn ||x(t) — Px(f )||2 + a||P — P(t — 1)|||' (37) 

where P(t — 1) is the estimate at the previous time instant and a is a design parameter. Once P(t) is 
calculated, the parameter estimates are obtained by the MUSIC technique. Note that this technique is 
loosely tied to the statistical model, stated in Section [III though it enjoys a remarkably low computational 
complexity. 
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B. Numerical Results 

Now, we consider the introduced techniques and the proposed one in some scenarios. For the PHD 
observation model, we also choose 

1 (S-ef 

p{9 I 6) = -j=e ^ (38) 

where we treat as a tuning parameter. The instantaneous estimator for the target tracking technique 
is RELAX with the information criterion in (l30l) and k = 3. 

1) Two Crossing Targets: In this setup, two moving sources (0i, (f), 02(f)) were considered. They 
moved according to the equations 9i = —7r/2 + 0.0l7rf and 02 = 7r/2 —O.Olvrf for t = 1, 2,..., T = 100. 
Their corresponding amplitudes were randomly generated hy the standard Gaussian distrihution. The 
noisy observations were obtained by adding centered, uncorrelated Gaussian noise to the observations, 
with variance 0.25, providing SNR?« 6dB. 

The proposed technique was applied by A = 2 and a = 0.5, together with the time evolution parameters 
(5i(0) = 0.1 and gq = ax = 0.03. We also considered instantaneous estimation by RELAX and enhanced 
the results by PHD filtering. Eor the latter case, the parameters /3 = 0.99, a = 0.01, (5(0) = 10“^, 
p = 0.04 and Ge = 0.01 are selected. Moreover, the subspace technique in (iTTl) is used with a = 2, 
adjusted for the best result. 

In terms of missed detection, false alarm and error, figures [T] |2] and |3] depicf fhe average qualify of fhe 
resulfing esfimafes over fime, respectively. Af a specific fime, fhe number of false alarms, and missed 
detections are simply calculated as the number of exceeding or lacking parameters, namely {ht — nt)+ 
and {n-t — ht)+, respectively. The error is calculated by adding the square error over the best assignment 
between estimates and the true parameters. 

As seen, the instantaneous RELAX estimator typically has a high false alarm rate. The PHD filter 
substantially improve both the false alarm, and the error properties of the RELAX method, but increases 
the missed detection rate. Changing the parameters of the PHD filter modifies fhe frade off befween false 
alarm and missed detection, but may not improve both. On the other hand, the proposed technique has 
improved miss-detection properties, but slightly increases the error level. This is due to the mismatch 
between the exact model in ([Til and the applied one in ([8]), which is well known to result in biased 
estimates. It is clearly seen that the proposed technique initially needs about 40 samples to achieve its 
steady behavior, but later rapidly adapts itself to a varying environment. This may imply an improper 
choice of initial parameters. Einally, notice that the proposed technique provides better results at the 
crossing point, suggesting that the proposed method relies more on the time correlation of parameters. 
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Time 


Fig. 1. Missed detection rate in the deterministic crossing setup, averaged over 16000 trials. 



Fig. 2. False alarm rate in the deterministic crossing setup, averaged over 16000 trials. 



Fig. 3. Mean square error in the deterministic crossing setup, averaged over 16000 trials. 


This can also be seen from the fact that in Figure the proposed technique corresponds to a smoother 
curve than the other techniques, showing higher temporal correlation between the estimates. 


April 3, 2015 


DRAFT 



























JOURNAL OF UTbX CLASS FILES, VOL. 6, NO. 1, JANUARY 2007 


17 



Fig. 4. Missed Detection rate in the sudden movement setup, averaged over 16000 trials. 


2) Single Target with a Sudden Change: In a different setup, we considered a single target 9. The 
target is assumed to be at rest for the first 100 samples, i.e. 6{t) = —7r/2 for f = 1,..., 100. Next, it 
started a linear movement with an impulsive initial position change, given by 6{t) = 3*7r/2 — 0.01 *7^*1 
for t = 101,... ,2000. 

The proposed technique was compared to sliding window, with the window function Wr = rj'^. This 
choice generally simplifies the calculations, since it leads to a recursive evaluation of the matrix as 

Rt+i = r/Rt + x(f)x(f)'^ (39) 

where Rj is defined in (|3T]) . It is interesting to see that the overall recursive calculation of R^ in the 
proposed algorithm is similar to (l39l) . when the forgetting factor is replaced by a time-varying parameter. 
We also used the SPICE technique to solve (ISTl) or equivalently (l29l) . leading to the same optimization 
in (fT9l) . when R^ and X{9) are replaced by R^ and Ao/(l — rj), respectively. From this perspective, the 
proposed method is a sliding window technique with a SPICE estimator, where adaptive forgetting factor 
and weights are utilized. 

Figures HI [51 [6] depict the average missed detection, false alarm and error results, respectively, where 
the same parameters as the previous setup and rj = 0.8 were used. In the initial stationary phase, the 
sliding window technique outperforms the proposed one, since the setup fits the assumptions of the sliding 
window. In terms of error, both techniques rapidly adapt to the sudden change, but the sliding window 
has a longer transient in terms of false alarm rate. 
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Fig. 5. False alarm rate in the sudden movement setup, averaged over 16000 trials. 



Fig. 6. Mean square error in the in the sudden movement setup, averaged over 16000 trials. 


V. Concluding Remarks 

In this paper, the prohlem of filtering a variable number of parameters in difficult scenarios was 
discussed. We used a recent modified Bayesian model in |[39l and relafed it to a RFS-based evolution 
model to obtain a consistent representation for our problem of interest. Next, we approximated the 
corresponding recursive Bayesian filter to our model, and obtained a tractable filter. We simplified the 
design to avoid heavy computations. This led to a filter based on two components; An approximate data 
covariance matrix, and a weight function, controlling miss-detection over the space of parameters. 

As the numerical experiments suggest, the technique is more robust to observation impairments and 
is more flexible against rapid movements. Our approach exploits, and is highly connected to the SPICE 
technique. Hence, it exhibits similar behavior. For example, it has a relatively short convergence rate and 
provides consistent estimates, but the effect of noise is not symmetric on the estimates. Mathematically 
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speaking, the estimates have a small statistical bias, proportional to the noise power. The method also 
exhibits a robust behavior in a low SNR regime. 

Herein, the emphasis was on simplifying calculations at each recursion by avoiding difficulties with 
the grid-based spectral manipulations and instead combining approximate information of different time 
instants to maintain performance. As observed by simulations, this is favorable in a low SNR case, where 
fusing multiple observations is necessary to obtain a reliable estimate. However, the method might be 
improved if complexity is not an immediate concern and a more complex approximation is desirable. 
Moreover, the possibility of bootstrapping and the application of particle filters should not be ruled out. 

Appendix A 

Calculus of Random Finite Sets 

A. Functional Representation 

To perform RBF, we need to calculate posteriors over finite sets, involving integration over RFS 
densities. Here, we review how this can be accomplished. In general, the probability distributions over 
the set of all finite sets can be represented by a sequence of real functions. For example, the marginal 
state distribution pt{St) may be represented by the function sequence {q^'^ : x > M+j defined 

by 

qf Vl, • • • , . . . ,In) = PtASt = , {On,In)}) (40) 

Note that the functions are symmetric under the permutation of the pairs {9^,1^), since the state set 
is invariant under such a transform. Moreover, for a fixed n, 

J (7f)(0i,...,0„,Ii,...,X„)dWX = n! xp(nt = n) (41) 

K”xK!f. 

The reason is that the left hand side integration hits each set St of order n exactly n! times by different 
permutations of parameters, but does not hit a set St of a different order. In the same manner, the transition 
probability Q can be expressed by the following sequence of functions 

...,en,ii,... ,in, ... ,x;,) = 

Q{S = m,Ik)},S' = {{e},I',)}) (42) 


B. Integration 

In general integration over the set of random finite sets can be explained in terms of the above functional 
representation. Consider the marginal distribution over the step of finite sets St, represented by sequence 
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of functions and take a function f{S) : 5 ^ M. Then, we have that 

/’/(5)d5 = f;^ f g"(0i,...,0„,Xi,...,I„)dWX (43) 

s e^xR" 

Notice how division hy n\ cancels the aforementioned effect of multiple recalculation. Other integrations 
such as marginalization in ([T3l) can he carried out in a similar manner. For example, suppose that the 
posterior p{St \ is represented hy functions at a certain time t. Then, the integration in (fTSl) 

yields to 

p(5t+i = {(0fc,Xfc)} |XW) = 

(44) 

where the similar argument of q^'^X) (q (|4^ is neglected. 


OO 

n '=0 0n'xR5:' 


Appendix B 

RFS Local Approximation 

Consider a distribution in the family, given hy (fT4b . and suppose that the parameters R and A are large. 
Take S = {(0i,Xi),..., {On,In)} as the maximum prohahility point. Then, a large deviation from S leads 
to a considerable probability reduction. Thus, we may assume that the deviation is small. Hence, local 
Taylor expansion can be applied. Note that a small deviation from the set S includes small perturbations 
leading to a typical hyper-state set 

5 = m + + Alfc)} u {{d{,xl ),... ( 04 ,x^)} ( 45 ) 

where the parameters, indexed with / are additional. Furthermore, the parameters A0fc, AX^ and X^ are 
assumed to be small. The negative log-density function is written as 


-logp(5; R, A) = 


Tr 


R ( (T I -|- E(^fc d" AX/j)a(0fc -|- A0fc)a {6^ -I- A0fc)-|- 

k 


El/a(ei)a''(«h 


-1 


-f 


E X{0k + A0fc)(Xfc + Aik) + E Kob^i 

k k 


We may now apply the Taylor expansion. 


(46) 
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A. Poisson Approximation 

Due to the local minimality of S, it turns out that the effect of and vanish up to the first 
order. This means the negative log-distrihution can he written as 


- log p{S] R, A) = 


Tr 


R(Ro + E4a(0{)a^(0{; 


-1 


+ 


T.m){ik) + T.K0i)4 


(47) 


where Rq = R(S'). Using the matrix inversion lemma and neglecting the cross-product terms we 

obtain 


-logp(S’; R, A) = 


-\ogp{S] R,A) + X; (A(6»{)lf 


a^(e{)R-^RR-^a(gUxn 
l-Fa^(0{)RRo^A(0{)I,^ ) 


(48) 


This shows that up to the first order, the behavior of the RFS can be identified by the Poisson process 
of additional elements with density 

w{9,I) = e V l+a^^(e)RR„ la(e)X J ( 49 ^ 


B. Extended Laplace’s Method 

To capture the behavior of A0fc and AI^, we need to consider the higher order terms. However, we 
neglect the cross-product terms in favor of numerical simplicity, and according to the fact they are often 
smaller due to low coherency in the basis manifold. Then after straightforward calculations, we obtain 
that 


-logp(5; R, A) = 
-logp(5; R,A) - X;iogu;(6'{,x{)- 

k 

k Eki^GkfGk + (AlkfHk 


where 


Gk — ^Tr 


R^ c72l + ^Xfca(4)a^(4) 


H, = -fjTr 


R(a"l + EXfea(0fc)a^(0fc) 


-1 


-1 


This implies that A0fc ~ Af{0, ^) and AX^ ~ A/’(0, ^). 


(50) 


(51) 
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Appendix C 

Perturbative KL-based Projection 
Suppose that the distrihution p{St+i \ is calculated as 

p{St+i = S I xW) « p{St = S I xW) + Ap{S) = p{s-, R+, A+) + Ap{S) 


(52) 


The question of interest is to find the perturbation in parameters minimizing the KL distance between 
p{St+i = S I and the parametric model, i.e to solve 


f{p{S; R+,A+) + Ap(5)) 
log (p{S; Rt+ + AR, X+ + AA)) dS 


arg mm — 
AR,AA s 


(53) 


Although this can be generally solved up to the first order, by the technique explained below, we restrict 
AR to the be 7 R^ for 7 > 0 to simplify calculations, and also to ensure positive semi-definiteness. 
After Taylor expansion, and performing the minimization, we obtain that 


7 = 


f Ap(S)dS 

'fp(S; U=odS 


(54) 


and 


AX{e) = - 


/ Ap(S)dS 

Jp(S; 


(55) 


We obtain the desired update by the above relations. We further simplify this relation in favor of low 
complexity. Using the approximation in (l50l) and after straightforward manipulations, we get that 




E ^AGfc + E+ f 


7 = 


©xRj. 


^^Au;d0dl 


E 


Gl 


+ E 

k 


V ^7 


m 


+ / 


f Qlogi. 


(56) 


and 


0xR+ 


V ^7 


a;d 6 »d 2 : 


AX{9) = 


f 




(57) 


This can be further simplified noting that the terms logo;, and are linear in I + 7 thus their partial 
derivative with respect to 7 equals their value at 7 = 0 , leading to 


7 = 


E^ + E^+ f logw X Au;d 6 »dX 
k k 0xR+ 


2n+ f (log uj)^ ujdffdl 

0xR+ 


(58) 
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According to the empirical observation that the terms involving oj are substantially smaller than the other 


terms, we simplify the calculations more by neglecting them to obtain 

k k 


(59) 


The expression in dSTl) can also be simplified by considering that Aw ^ S, and approximating w as 




(60) 


Simple calculations lead to 

AA = -i(A(0) - {e)-R-\t)-R+-R-\t)a^(9)) jX5{e,Z)&I (61) 

R+ 
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